-
Notifications
You must be signed in to change notification settings - Fork 0
Mirror upstream elastic/elasticsearch#133903 for AI review (snapshot of HEAD tree) #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
BASE=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be HEAD=051723c20ef007ff63b8b530a5b3805947404935 Branch=main
WalkthroughCentralizes data stream field sanitization in DataStream and updates RerouteProcessor to use it. Introduces TargetIndex for deriving target indexes and integrates it into OTEL data point grouping and metric document building. Adjusts visibility in MappingHints. Adds comprehensive tests for TargetIndex, grouping behavior, document data_stream output, and ignored attributes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor OTELExporter as OTEL Exporter
participant Ingest as Ingest Pipeline
participant Group as DataPointGroupingContext
participant Target as TargetIndex
participant Builder as MetricDocumentBuilder
participant ES as Elasticsearch
OTELExporter->>Ingest: Send datapoints
Ingest->>Group: Group datapoints by scope/resource
Group->>Group: extract receiverName from scope name
Group->>Target: evaluate(type, attributes, receiverName, scopeAttrs, resourceAttrs)
Target-->>Group: TargetIndex (index or data_stream fields)
Group->>Builder: DataPointGroup with TargetIndex
Builder->>Builder: buildDataStream if isDataStream()
Builder->>ES: Index document to TargetIndex.index()
ES-->>OTELExporter: Ack
sequenceDiagram
autonumber
participant RP as RerouteProcessor
participant DS as DataStream
RP->>DS: sanitizeType(type)
RP->>DS: sanitizeDataset(dataset)
RP->>DS: sanitizeNamespace(namespace)
DS-->>RP: sanitized values
RP->>RP: proceed with routing using sanitized values
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)
167-179
: Optional: also assert data_stream presence in empty-fields test.Since you pass
TargetIndex.defaultMetrics()
, consider assertingdata_stream.*
exists even when resource/scope are empty, to prevent regressions.@@ - documentBuilder.buildMetricDocument(builder, dataPointGroup); + documentBuilder.buildMetricDocument(builder, dataPointGroup); @@ - assertThat(doc.evaluate("unit"), is(nullValue())); + assertThat(doc.evaluate("unit"), is(nullValue())); + assertThat(doc.evaluate("data_stream.type"), equalTo("metrics")); + assertThat(doc.evaluate("data_stream.dataset"), equalTo("generic.otel")); + assertThat(doc.evaluate("data_stream.namespace"), equalTo("default"));x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)
63-66
: Consider honoring mapping hints instead of always passing empty.If users provide
elasticsearch.mapping.hints
, propagate them to dynamic template selection.- String dynamicTemplate = dataPoint.getDynamicTemplate(MappingHints.empty()); + // Parse once per group if needed; here kept local for clarity + String dynamicTemplate = dataPoint.getDynamicTemplate( + MappingHints.fromAttributes(dataPointGroup.dataPointAttributes()) + );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java
(2 hunks)server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
(2 hunks)x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java
(6 hunks)x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
(1 hunks)x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java
(2 hunks)x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java
(4 hunks)x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java
(4 hunks)x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java
(1 hunks)x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java
(5 hunks)x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/tsid/AttributeListTsidFunnelTests.java
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
TargetIndex
(21-158)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)
DataStream
(87-2183)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
TargetIndex
(21-158)
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)
server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)
DataStream
(87-2183)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
TargetIndex
(21-158)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
TargetIndex
(21-158)
🔇 Additional comments (17)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (2)
26-26
: Good switch to TargetIndex.defaultMetrics().Using the shared default ensures tests track routing/index construction semantics consistently.
Also applies to: 77-78, 146-147, 178-179
106-108
: Validating data_stream fields is spot on.These assertions correctly reflect the derived defaults: type=metrics, dataset=generic.otel, namespace=default.
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)
53-55
: Data stream block integration looks good.Emitting
data_stream
only whentargetIndex.isDataStream()
is true matches the routing model and keeps documents clean for plain-index targets.Also applies to: 115-124
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java (1)
30-31
: No external usages ofMAPPING_HINTS
found; making it private is safe.
Tests use the literal"elasticsearch.mapping.hints"
, so they’re unaffected.modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)
12-12
: Centralizing sanitization approved — DataStream.sanitizeType, sanitizeDataset, and sanitizeNamespace use the same disallowed-character patterns (type & dataset: "[\\/?"<>| ,#:-]", namespace: "[\\/?"<>| ,#:]"), enforce MAX_LENGTH = 100, and replace with “_”. No behavior drift detected.server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)
108-118
: LGTM! Clean public API design.The three public static sanitization methods (
sanitizeType
,sanitizeDataset
,sanitizeNamespace
) provide a clear, well-structured API for data stream field normalization. The delegation to a common private helper ensures consistency.x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java (3)
62-87
: LGTM! Test correctly validates target index-based grouping.The test
testGroupingDifferentTargetIndex
effectively verifies that data points with differentdata_stream.dataset
attributes are grouped separately and routed to distinct target indexes. The use ofcontainsInAnyOrder
is appropriate for unordered collection assertions.
244-266
: LGTM! Test validates receiver-based routing.The test
testReceiverBasedRouting
correctly verifies that the receiver name is extracted from the scope name and used to determine the target index. The expected indexmetrics-hostmetricsreceiver.otel-default
aligns with the receiver namehostmetricsreceiver
parsed from the scope path.
268-289
: LGTM! Test covers edge case for receiver name extraction.The test
testReceiverBasedRoutingWithoutTrailingSlash
validates the edge case where the scope name ends immediately after the receiver name (no trailing path segments). This ensuresextractReceiverName
correctly handles the absence of a trailing slash.x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (3)
23-37
: LGTM! Test correctly validates explicit index precedence.The test verifies that when
elasticsearch.index
is set, it takes precedence overdata_stream.*
attributes, and the result is a non-data-stream target index.
39-53
: LGTM! Test validates data stream attribute handling and sanitization.The test correctly verifies:
- Dataset sanitization:
custom-dataset
→custom_dataset
(hyphen replaced), then.otel
suffix added.- Namespace sanitization:
custom-namespace
remains unchanged (hyphens allowed in namespace).
55-142
: LGTM! Comprehensive test coverage for TargetIndex evaluation logic.The test suite thoroughly validates:
- Scope and resource attribute handling
- Attribute precedence rules (attributes > scope > resource)
- Receiver-based routing
- Default value behavior
- Sanitization of uppercase, hyphens, and special characters
All assertions align with the expected behavior of
TargetIndex.evaluate
andDataStream
sanitization methods.x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (3)
164-194
: LGTM! Receiver name extraction logic is sound.The
extractReceiverName
method correctly parses the receiver name from the scope name by:
- Finding the
/receiver/
segment- Extracting the substring between
/receiver/
and the next/
(or end of string)- Returning
null
if no receiver pattern is foundEdge cases are handled correctly, including scope names ending with the receiver name (no trailing slash).
220-226
: LGTM! TargetIndex.evaluate call is correct.The call to
TargetIndex.evaluate
correctly passes:
- Type:
TargetIndex.TYPE_METRICS
("metrics")- Attributes: data point, scope, and resource attributes in precedence order
- Receiver name: extracted from the scope name
This ensures the target index is computed with all relevant context.
172-172
: LGTM:String
→TargetIndex
consistently applied; tests andbuildDataStream
updated accordingly. Public API change: callers must call.index()
or.toString()
to get the string representation.x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (2)
21-40
: API shape looks good; defaults and caching are practicalThe public API and defaultMetrics() caching read well, and the data-stream vs explicit-index split is clear.
58-64
: All evaluate() call sites use non-null list arguments
DataPoint.getAttributes()/getScopeAttributes()/getResourceAttributes() never return null and tests use List.of(). Safe to add requireNonNull checks for the list parameters.
private static final int MAX_LENGTH = 100; | ||
private static final String REPLACEMENT = "_"; | ||
private static final Pattern DISALLOWED_IN_TYPE = Pattern.compile("[\\\\/*?\"<>| ,#:-]"); | ||
private static final Pattern DISALLOWED_IN_DATASET = Pattern.compile("[\\\\/*?\"<>| ,#:-]"); | ||
private static final Pattern DISALLOWED_IN_NAMESPACE = Pattern.compile("[\\\\/*?\"<>| ,#:]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
Verify pattern correctness and rationale for identical TYPE and DATASET patterns.
The patterns DISALLOWED_IN_TYPE
and DISALLOWED_IN_DATASET
are identical, both disallowing [\\/*?\"<>| ,#:-]
. However, DISALLOWED_IN_NAMESPACE
differs by not disallowing hyphens (only [\\/*?\"<>| ,#:]
).
Confirm that:
- The regex patterns correctly escape special characters (especially the backslash and forward slash).
- TYPE and DATASET should indeed have identical character restrictions, or if there's a functional difference that should be reflected.
- The choice to allow hyphens in namespace but not in type/dataset is intentional and documented.
Run the following script to check if these patterns are used consistently across the codebase and verify any related documentation:
🏁 Script executed:
#!/bin/bash
# Search for usages of these sanitization methods and any related comments
rg -n -C3 'sanitizeType|sanitizeDataset|sanitizeNamespace|DISALLOWED_IN_TYPE|DISALLOWED_IN_DATASET|DISALLOWED_IN_NAMESPACE'
Length of output: 12218
private static String sanitizeDataStreamField(String s, Pattern disallowedInDataset) { | ||
if (s == null) { | ||
return null; | ||
} | ||
s = s.toLowerCase(Locale.ROOT); | ||
s = s.substring(0, Math.min(s.length(), MAX_LENGTH)); | ||
return disallowedInDataset.matcher(s).replaceAll(REPLACEMENT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Consider edge case: empty or all-invalid input strings.
The sanitization logic correctly handles null input and applies lowercase, length truncation, and character replacement. However, consider these edge cases:
- Empty string input: Returns empty string after sanitization, which may not be a valid dataset/namespace/type.
- All-disallowed-characters input: For example,
"###"
becomes"___"
after sanitization, which might not be semantically meaningful.
Verify that callers handle empty or sanitized-to-invalid results appropriately, or add validation to reject such inputs.
🏁 Script executed:
#!/bin/bash
# Search for callers of these sanitization methods to verify they handle edge cases
rg -n -A5 'sanitizeType\(|sanitizeDataset\(|sanitizeNamespace\(' --type java
Length of output: 3450
Validate empty sanitized values
sanitizeDataStreamField returns ""
for empty or fully-disallowed inputs, but callers only default on null
(e.g. TargetIndex.java 79–89), so empty results slip through. Treat empty sanitized outputs as invalid (null) or add an isEmpty()
check before defaulting.
🤖 Prompt for AI Agents
In server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
around lines 120–127, sanitizeDataStreamField can return an empty string for
inputs that are empty or fully-disallowed which callers only treat null as
invalid; update the method so that after converting, trimming length and
applying the disallowed pattern, if the result is empty (result.isEmpty())
return null instead of an empty string so callers that check for null will
correctly treat fully-sanitized values as invalid.
import io.opentelemetry.proto.common.v1.KeyValue; | ||
|
||
import org.elasticsearch.cluster.metadata.DataStream; | ||
import org.elasticsearch.core.Nullable; | ||
|
||
import java.util.List; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add Objects import to support null-checks in evaluate()
import io.opentelemetry.proto.common.v1.KeyValue;
import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.core.Nullable;
import java.util.List;
+import java.util.Objects;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import io.opentelemetry.proto.common.v1.KeyValue; | |
import org.elasticsearch.cluster.metadata.DataStream; | |
import org.elasticsearch.core.Nullable; | |
import java.util.List; | |
import io.opentelemetry.proto.common.v1.KeyValue; | |
import org.elasticsearch.cluster.metadata.DataStream; | |
import org.elasticsearch.core.Nullable; | |
import java.util.List; | |
import java.util.Objects; |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 10 to 16, the evaluate() method needs null-check utilities; add the
import statement `import java.util.Objects;` at the top of the file and update
evaluate() to use Objects (e.g., Objects.requireNonNull(...) or
Objects.isNull(...) as appropriate) for null checks instead of raw null
comparisons to make the logic clearer and avoid potential NPEs.
public static boolean isTargetIndexAttribute(String attributeKey) { | ||
return attributeKey.equals(ELASTICSEARCH_INDEX) | ||
|| attributeKey.equals(DATA_STREAM_DATASET) | ||
|| attributeKey.equals(DATA_STREAM_NAMESPACE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPE risk in isTargetIndexAttribute(String) when attributeKey is null
Use constant-first equals to avoid null deref.
- public static boolean isTargetIndexAttribute(String attributeKey) {
- return attributeKey.equals(ELASTICSEARCH_INDEX)
- || attributeKey.equals(DATA_STREAM_DATASET)
- || attributeKey.equals(DATA_STREAM_NAMESPACE);
- }
+ public static boolean isTargetIndexAttribute(String attributeKey) {
+ return ELASTICSEARCH_INDEX.equals(attributeKey)
+ || DATA_STREAM_DATASET.equals(attributeKey)
+ || DATA_STREAM_NAMESPACE.equals(attributeKey);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public static boolean isTargetIndexAttribute(String attributeKey) { | |
return attributeKey.equals(ELASTICSEARCH_INDEX) | |
|| attributeKey.equals(DATA_STREAM_DATASET) | |
|| attributeKey.equals(DATA_STREAM_NAMESPACE); | |
} | |
public static boolean isTargetIndexAttribute(String attributeKey) { | |
return ELASTICSEARCH_INDEX.equals(attributeKey) | |
|| DATA_STREAM_DATASET.equals(attributeKey) | |
|| DATA_STREAM_NAMESPACE.equals(attributeKey); | |
} |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 42 to 46, the method isTargetIndexAttribute(String) calls
attributeKey.equals(...) and will NPE if attributeKey is null; change the
comparisons to use constant-first equals (e.g.
ELASTICSEARCH_INDEX.equals(attributeKey) ||
DATA_STREAM_DATASET.equals(attributeKey) ||
DATA_STREAM_NAMESPACE.equals(attributeKey)) or use Objects.equals(constant,
attributeKey) so null attributeKey is handled safely.
public static TargetIndex evaluate( | ||
String type, | ||
List<KeyValue> attributes, | ||
@Nullable String receiverName, | ||
List<KeyValue> scopeAttributes, | ||
List<KeyValue> resourceAttributes | ||
) { | ||
// Order: | ||
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | ||
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | ||
// 3. receiver-based routing based on scope.name | ||
// 4. use default hardcoded data_stream.* (<type>-generic-default) | ||
TargetIndex target = new TargetIndex(); | ||
target.populateFrom(attributes); | ||
target.populateFrom(scopeAttributes); | ||
target.populateFrom(resourceAttributes); | ||
if (target.index == null) { | ||
target.type = type; | ||
if (target.dataset == null && receiverName != null) { | ||
target.dataset = receiverName; | ||
} | ||
target.dataset = DataStream.sanitizeDataset(target.dataset); | ||
if (target.dataset == null) { | ||
target.dataset = DEFAULT_DATASET; | ||
} | ||
// add otel suffix to match OTel index template | ||
target.dataset = target.dataset + OTEL_DATASET_SUFFIX; | ||
target.namespace = DataStream.sanitizeNamespace(target.namespace); | ||
|
||
if (target.namespace == null) { | ||
target.namespace = DEFAULT_NAMESPACE; | ||
} | ||
target.index = target.type + "-" + target.dataset + "-" + target.namespace; | ||
} else { | ||
target.type = null; | ||
target.dataset = null; | ||
target.namespace = null; | ||
} | ||
return target; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Harden evaluate(): validate inputs, sanitize type, and avoid double “.otel” suffix
- Enforce non-null for type and attribute lists.
- Sanitize type via DataStream.sanitizeType(...) to mirror dataset/namespace handling.
- Append OTEL suffix only if not already present.
public static TargetIndex evaluate(
String type,
List<KeyValue> attributes,
@Nullable String receiverName,
List<KeyValue> scopeAttributes,
List<KeyValue> resourceAttributes
) {
+ Objects.requireNonNull(type, "type must not be null");
+ Objects.requireNonNull(attributes, "attributes must not be null");
+ Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null");
+ Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null");
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
- // 3. receiver-based routing based on scope.name
+ // 3. receiver-based routing based on receiverName
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
target.populateFrom(attributes);
target.populateFrom(scopeAttributes);
target.populateFrom(resourceAttributes);
if (target.index == null) {
- target.type = type;
+ // Sanitize type just like dataset/namespace (keeps index name safe).
+ target.type = DataStream.sanitizeType(type);
if (target.dataset == null && receiverName != null) {
target.dataset = receiverName;
}
target.dataset = DataStream.sanitizeDataset(target.dataset);
if (target.dataset == null) {
target.dataset = DEFAULT_DATASET;
}
- // add otel suffix to match OTel index template
- target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+ // add otel suffix to match OTel index template (avoid double-suffix)
+ if (target.dataset.endsWith(OTEL_DATASET_SUFFIX) == false) {
+ target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+ }
target.namespace = DataStream.sanitizeNamespace(target.namespace);
if (target.namespace == null) {
target.namespace = DEFAULT_NAMESPACE;
}
target.index = target.type + "-" + target.dataset + "-" + target.namespace;
Based on relevant code snippet indicating sanitizeType exists in DataStream.
Would you confirm sanitizeType(String) is present in this tree? If not, I’ll switch to a minimal lowercase/length guard instead.
🏁 Script executed:
#!/usr/bin/env bash
rg -nP --type java "sanitizeType" -C3
Length of output: 1796
Harden evaluate(): validate inputs, sanitize type, and avoid double “.otel” suffix
- Add non‐null checks for
type
,attributes
,scopeAttributes
, andresourceAttributes
. - Sanitize
type
viaDataStream.sanitizeType(type)
. - Append
OTEL_DATASET_SUFFIX
only if it’s not already present.
public static TargetIndex evaluate(
String type,
List<KeyValue> attributes,
@Nullable String receiverName,
List<KeyValue> scopeAttributes,
List<KeyValue> resourceAttributes
) {
+ Objects.requireNonNull(type, "type must not be null");
+ Objects.requireNonNull(attributes, "attributes must not be null");
+ Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null");
+ Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null");
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on receiverName
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
target.populateFrom(attributes);
target.populateFrom(scopeAttributes);
target.populateFrom(resourceAttributes);
if (target.index == null) {
- target.type = type;
+ target.type = DataStream.sanitizeType(type);
if (target.dataset == null && receiverName != null) {
target.dataset = receiverName;
}
target.dataset = DataStream.sanitizeDataset(target.dataset);
if (target.dataset == null) {
target.dataset = DEFAULT_DATASET;
}
- // add otel suffix to match OTel index template
- target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+ // add OTEL suffix to match OTel index template (avoid double-suffix)
+ if (target.dataset.endsWith(OTEL_DATASET_SUFFIX) == false) {
+ target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+ }
target.namespace = DataStream.sanitizeNamespace(target.namespace);
if (target.namespace == null) {
target.namespace = DEFAULT_NAMESPACE;
}
target.index = target.type + "-" + target.dataset + "-" + target.namespace;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public static TargetIndex evaluate( | |
String type, | |
List<KeyValue> attributes, | |
@Nullable String receiverName, | |
List<KeyValue> scopeAttributes, | |
List<KeyValue> resourceAttributes | |
) { | |
// Order: | |
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | |
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | |
// 3. receiver-based routing based on scope.name | |
// 4. use default hardcoded data_stream.* (<type>-generic-default) | |
TargetIndex target = new TargetIndex(); | |
target.populateFrom(attributes); | |
target.populateFrom(scopeAttributes); | |
target.populateFrom(resourceAttributes); | |
if (target.index == null) { | |
target.type = type; | |
if (target.dataset == null && receiverName != null) { | |
target.dataset = receiverName; | |
} | |
target.dataset = DataStream.sanitizeDataset(target.dataset); | |
if (target.dataset == null) { | |
target.dataset = DEFAULT_DATASET; | |
} | |
// add otel suffix to match OTel index template | |
target.dataset = target.dataset + OTEL_DATASET_SUFFIX; | |
target.namespace = DataStream.sanitizeNamespace(target.namespace); | |
if (target.namespace == null) { | |
target.namespace = DEFAULT_NAMESPACE; | |
} | |
target.index = target.type + "-" + target.dataset + "-" + target.namespace; | |
} else { | |
target.type = null; | |
target.dataset = null; | |
target.namespace = null; | |
} | |
return target; | |
} | |
public static TargetIndex evaluate( | |
String type, | |
List<KeyValue> attributes, | |
@Nullable String receiverName, | |
List<KeyValue> scopeAttributes, | |
List<KeyValue> resourceAttributes | |
) { | |
Objects.requireNonNull(type, "type must not be null"); | |
Objects.requireNonNull(attributes, "attributes must not be null"); | |
Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null"); | |
Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null"); | |
// Order: | |
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | |
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | |
// 3. receiver-based routing based on receiverName | |
// 4. use default hardcoded data_stream.* (<type>-generic-default) | |
TargetIndex target = new TargetIndex(); | |
target.populateFrom(attributes); | |
target.populateFrom(scopeAttributes); | |
target.populateFrom(resourceAttributes); | |
if (target.index == null) { | |
target.type = DataStream.sanitizeType(type); | |
if (target.dataset == null && receiverName != null) { | |
target.dataset = receiverName; | |
} | |
target.dataset = DataStream.sanitizeDataset(target.dataset); | |
if (target.dataset == null) { | |
target.dataset = DEFAULT_DATASET; | |
} | |
// add OTEL suffix to match OTel index template (avoid double-suffix) | |
if (!target.dataset.endsWith(OTEL_DATASET_SUFFIX)) { | |
target.dataset = target.dataset + OTEL_DATASET_SUFFIX; | |
} | |
target.namespace = DataStream.sanitizeNamespace(target.namespace); | |
if (target.namespace == null) { | |
target.namespace = DEFAULT_NAMESPACE; | |
} | |
target.index = target.type + "-" + target.dataset + "-" + target.namespace; | |
} else { | |
target.type = null; | |
target.dataset = null; | |
target.namespace = null; | |
} | |
return target; | |
} |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 58-97, the evaluate(...) method should validate and sanitize
inputs: add explicit non-null checks for type, attributes, scopeAttributes, and
resourceAttributes (throw an IllegalArgumentException with a clear message if
any are null), sanitize the incoming type by calling
DataStream.sanitizeType(type) and use that sanitized value for target.type, and
when appending OTEL_DATASET_SUFFIX only add it if target.dataset does not
already end with that suffix (use a null-safe check after sanitization). Also
ensure you handle receiverName possibly null as before and avoid any operations
on null lists to prevent NPEs.
// Order: | ||
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | ||
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | ||
// 3. receiver-based routing based on scope.name | ||
// 4. use default hardcoded data_stream.* (<type>-generic-default) | ||
TargetIndex target = new TargetIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment nit: align step description with parameter name
Replace “scope.name” with “receiverName” to match the method contract and upstream usage.
- // 3. receiver-based routing based on scope.name
+ // 3. receiver-based routing based on receiverName
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Order: | |
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | |
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | |
// 3. receiver-based routing based on scope.name | |
// 4. use default hardcoded data_stream.* (<type>-generic-default) | |
TargetIndex target = new TargetIndex(); | |
// Order: | |
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes | |
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes | |
// 3. receiver-based routing based on receiverName | |
// 4. use default hardcoded data_stream.* (<type>-generic-default) | |
TargetIndex target = new TargetIndex(); |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 65 to 70, the comment listing routing steps refers to "scope.name"
but the method/parameter is named "receiverName"; update the comment to replace
"scope.name" with "receiverName" so the step description aligns with the method
contract and upstream usage.
private void populateFrom(List<KeyValue> attributes) { | ||
if (isPopulated()) { | ||
return; | ||
} | ||
for (int i = 0, size = attributes.size(); i < size; i++) { | ||
KeyValue attr = attributes.get(i); | ||
if (attr.getKey().equals(ELASTICSEARCH_INDEX)) { | ||
index = attr.getValue().getStringValue(); | ||
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) { | ||
dataset = attr.getValue().getStringValue(); | ||
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) { | ||
namespace = attr.getValue().getStringValue(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Make “first wins” consistent and exit early when elasticsearch.index is found
Currently, multiple ELASTICSEARCH_INDEX entries in the same list cause “last wins,” unlike dataset/namespace. Prefer “first wins” and return immediately once index is set.
private void populateFrom(List<KeyValue> attributes) {
if (isPopulated()) {
return;
}
for (int i = 0, size = attributes.size(); i < size; i++) {
KeyValue attr = attributes.get(i);
- if (attr.getKey().equals(ELASTICSEARCH_INDEX)) {
- index = attr.getValue().getStringValue();
+ if (index == null && attr.getKey().equals(ELASTICSEARCH_INDEX)) {
+ index = attr.getValue().getStringValue();
+ if (index != null) {
+ return; // explicit index takes precedence; stop scanning this list
+ }
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) {
dataset = attr.getValue().getStringValue();
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) {
namespace = attr.getValue().getStringValue();
}
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
private void populateFrom(List<KeyValue> attributes) { | |
if (isPopulated()) { | |
return; | |
} | |
for (int i = 0, size = attributes.size(); i < size; i++) { | |
KeyValue attr = attributes.get(i); | |
if (attr.getKey().equals(ELASTICSEARCH_INDEX)) { | |
index = attr.getValue().getStringValue(); | |
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) { | |
dataset = attr.getValue().getStringValue(); | |
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) { | |
namespace = attr.getValue().getStringValue(); | |
} | |
} | |
} | |
private void populateFrom(List<KeyValue> attributes) { | |
if (isPopulated()) { | |
return; | |
} | |
for (int i = 0, size = attributes.size(); i < size; i++) { | |
KeyValue attr = attributes.get(i); | |
if (index == null && attr.getKey().equals(ELASTICSEARCH_INDEX)) { | |
index = attr.getValue().getStringValue(); | |
if (index != null) { | |
return; // explicit index takes precedence; stop scanning this list | |
} | |
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) { | |
dataset = attr.getValue().getStringValue(); | |
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) { | |
namespace = attr.getValue().getStringValue(); | |
} | |
} | |
} |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 101 to 115, the current loop allows multiple ELASTICSEARCH_INDEX
attributes to overwrite index (last-wins) while dataset/namespace follow
first-wins; change the index handling to follow first-wins and exit early: only
set index if index == null when encountering ELASTICSEARCH_INDEX, and
immediately break/return once index is set so subsequent attributes cannot
overwrite it, leaving dataset and namespace logic unchanged.
public static boolean isMappingHintsAttribute(String attributeKey) { | ||
return attributeKey.equals(MAPPING_HINTS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Null-safety for attribute key check.
Defensively guard against null
to avoid accidental NPEs when called from other contexts.
- public static boolean isMappingHintsAttribute(String attributeKey) {
- return attributeKey.equals(MAPPING_HINTS);
- }
+ public static boolean isMappingHintsAttribute(String attributeKey) {
+ return MAPPING_HINTS.equals(attributeKey);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public static boolean isMappingHintsAttribute(String attributeKey) { | |
return attributeKey.equals(MAPPING_HINTS); | |
} | |
public static boolean isMappingHintsAttribute(String attributeKey) { | |
return MAPPING_HINTS.equals(attributeKey); | |
} |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java
around lines 65-67, the method isMappingHintsAttribute(String attributeKey)
calls attributeKey.equals(MAPPING_HINTS) which can throw NPE if attributeKey is
null; change the check to be null-safe by either using
MAPPING_HINTS.equals(attributeKey) or Objects.equals(attributeKey,
MAPPING_HINTS) (and add an import for java.util.Objects if used), so the method
returns false for null inputs without throwing.
public static boolean isIgnoredAttribute(String attributeKey) { | ||
return attributeKey.equals(MappingHints.MAPPING_HINTS); | ||
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Add a null-guard to avoid accidental NPEs.
Even if OTel keys are non-null, this public method may be reused. Guarding costs nothing.
- public static boolean isIgnoredAttribute(String attributeKey) {
- return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
- }
+ public static boolean isIgnoredAttribute(String attributeKey) {
+ if (attributeKey == null) return false;
+ return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public static boolean isIgnoredAttribute(String attributeKey) { | |
return attributeKey.equals(MappingHints.MAPPING_HINTS); | |
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey); | |
} | |
public static boolean isIgnoredAttribute(String attributeKey) { | |
if (attributeKey == null) return false; | |
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey); | |
} |
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java
around lines 145-147, the public isIgnoredAttribute method can throw an NPE if
attributeKey is null; add a null-guard at the top (e.g., if (attributeKey ==
null) return false;) before calling TargetIndex.isTargetIndexAttribute(...) and
MappingHints.isMappingHintsAttribute(...), so the method safely returns false
for null inputs.
import static org.hamcrest.Matchers.equalTo; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Nice coverage for ignored attributes.
This ensures TSID stays stable by excluding routing-only keys.
Add the mapping hints key as well:
@@
- List.of(
+ List.of(
keyValue("elasticsearch.index", "index"),
keyValue("data_stream.dataset", "dataset"),
- keyValue("data_stream.namespace", "namespace")
+ keyValue("data_stream.namespace", "namespace"),
+ keyValue("elasticsearch.mapping.hints", "ignored")
),
funnelBuilder
);
assertThat(funnelBuilder.size(), equalTo(0));
Also applies to: 60-70
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/tsid/AttributeListTsidFunnelTests.java
around lines 18-19 (and also update the same logic at lines 60-70), the test's
ignored-attributes list is missing the mapping hints key; update the test to
include the mapping hints key in the set of attributes that are excluded from
TSID computation (add the mapping hints constant/string to the ignored
attributes collection where routing-only keys are specified) so the test
verifies TSID stability with that key ignored as well.
Single commit with tree=051723c20ef007ff63b8b530a5b3805947404935^{tree}, parent=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be. Exact snapshot of upstream PR head. No conflict resolution attempted.
Summary by CodeRabbit